Goto

Collaborating Authors

 kernel banach space


Learning Functional Transduction: S.I. Contents

Neural Information Processing Systems

We propose below the proofs of the results presented in the main text. RKBS developed in (Zhang et al., 2009; Song et al., 2013) to develop the notion of vector-valued (Giles, 1967). " 0, @ j ď n, @ u P U (9) which allows us to say that O P RKBS (Corollary 3.2 of Zhang (2013)) that we recall hereafter: We first define for any linear operator We show our result in the case J=1 and can be directly extended to any cardinality J. Specifically, we tested three expressions: Exp. The two first expressions yield similar result in the ADR experiment at an equal compute cost. We also tried a'branch' and'trunk' networks formulation of the model as in DeepONet (Lu T able S.2: Summary of the architectural hyperparameters used to build the Transducer in the four experiments. 'Depth' corresponds to network number of layers, 'MLP dim' to the dimensionality of the hidden layer As stated, we used for all experiments, the same meta-training procedure. T able S.3: Summary of the meta-learning hyperparameters used to meta-train the Transducer in our four Figure S.1: Examples of sampled functions δ p xq and ν px q used to build operators O We train Tranducers for 200K gradient steps. Flow library (Holl et al., 2020) that allows for batched and differentiable simulations of fluid dynamics Figure S.5: Magnitude of the complex coefficients of the Fourier transform of an exemple pair of input and In order to tackle the high-resolution climate modeling experiment, we take inspiration from Pathak et al. (2022), which combines neural operators with the patch splitting L " 12, in order to match number of trainable parameters.


Learning Functional Transduction: S.I. Contents

Neural Information Processing Systems

We propose below the proofs of the results presented in the main text. RKBS developed in (Zhang et al., 2009; Song et al., 2013) to develop the notion of vector-valued (Giles, 1967). " 0, @ j ď n, @ u P U (9) which allows us to say that O P RKBS (Corollary 3.2 of Zhang (2013)) that we recall hereafter: We first define for any linear operator We show our result in the case J=1 and can be directly extended to any cardinality J. Specifically, we tested three expressions: Exp. The two first expressions yield similar result in the ADR experiment at an equal compute cost. We also tried a'branch' and'trunk' networks formulation of the model as in DeepONet (Lu T able S.2: Summary of the architectural hyperparameters used to build the Transducer in the four experiments. 'Depth' corresponds to network number of layers, 'MLP dim' to the dimensionality of the hidden layer As stated, we used for all experiments, the same meta-training procedure. T able S.3: Summary of the meta-learning hyperparameters used to meta-train the Transducer in our four Figure S.1: Examples of sampled functions δ p xq and ν px q used to build operators O We train Tranducers for 200K gradient steps. Flow library (Holl et al., 2020) that allows for batched and differentiable simulations of fluid dynamics Figure S.5: Magnitude of the complex coefficients of the Fourier transform of an exemple pair of input and In order to tackle the high-resolution climate modeling experiment, we take inspiration from Pathak et al. (2022), which combines neural operators with the patch splitting L " 12, in order to match number of trainable parameters.


Vector-Valued Reproducing Kernel Banach Spaces for Neural Networks and Operators

Dummer, Sven, Heeringa, Tjeerd Jan, Iglesias, José A.

arXiv.org Machine Learning

Recently, there has been growing interest in characterizing the function spaces underlying neural networks. While shallow and deep scalar-valued neural networks have been linked to scalar-valued reproducing kernel Banach spaces (RKBS), $\mathbb{R}^d$-valued neural networks and neural operator models remain less understood in the RKBS setting. To address this gap, we develop a general definition of vector-valued RKBS (vv-RKBS), which inherently includes the associated reproducing kernel. Our construction extends existing definitions by avoiding restrictive assumptions such as symmetric kernel domains, finite-dimensional output spaces, reflexivity, or separability, while still recovering familiar properties of vector-valued reproducing kernel Hilbert spaces (vv-RKHS). We then show that shallow $\mathbb{R}^d$-valued neural networks are elements of a specific vv-RKBS, namely an instance of the integral and neural vv-RKBS. To also explore the functional structure of neural operators, we analyze the DeepONet and Hypernetwork architectures and demonstrate that they too belong to an integral and neural vv-RKBS. In all cases, we establish a Representer Theorem, showing that optimization over these function spaces recovers the corresponding neural architectures.


Mirror Descent on Reproducing Kernel Banach Spaces

Kumar, Akash, Belkin, Mikhail, Pandit, Parthe

arXiv.org Machine Learning

Recent advances in machine learning have led to increased interest in reproducing kernel Banach spaces (RKBS) as a more general framework that extends beyond reproducing kernel Hilbert spaces (RKHS). These works have resulted in the formulation of representer theorems under several regularized learning schemes. However, little is known about an optimization method that encompasses these results in this setting. This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS. To tackle this challenge, we propose an algorithm based on mirror descent (MDA). Our approach involves an iterative method that employs gradient steps in the dual space of the Banach space using the reproducing kernel. We analyze the convergence properties of our algorithm under various assumptions and establish two types of results: first, we identify conditions under which a linear convergence rate is achievable, akin to optimization in the Euclidean setting, and provide a proof of the linear rate; second, we demonstrate a standard convergence rate in a constrained setting. Moreover, to instantiate this algorithm in practice, we introduce a novel family of RKBSs with $p$-norm ($p \neq 2$), characterized by both an explicit dual map and a kernel.


A Lipschitz spaces view of infinitely wide shallow neural networks

Bartolucci, Francesca, Carioni, Marcello, Iglesias, José A., Korolev, Yury, Naldi, Emanuele, Vigogna, Stefano

arXiv.org Machine Learning

We revisit the mean field parametrization of shallow neural networks, using signed measures on unbounded parameter spaces and duality pairings that take into account the regularity and growth of activation functions. This setting directly leads to the use of unbalanced Kantorovich-Rubinstein norms defined by duality with Lipschitz functions, and of spaces of measures dual to those of continuous functions with controlled growth. These allow to make transparent the need for total variation and moment bounds or penalization to obtain existence of minimizers of variational formulations, under which we prove a compactness result in strong Kantorovich-Rubinstein norm, and in the absence of which we show several examples demonstrating undesirable behavior. Further, the Kantorovich-Rubinstein setting enables us to combine the advantages of a completely linear parametrization and ensuing reproducing kernel Banach space framework with optimal transport insights. We showcase this synergy with representer theorems and uniform large data limits for empirical risk minimization, and in proposed formulations for distillation and fusion applications.


Which Spaces can be Embedded in $L_p$-type Reproducing Kernel Banach Space? A Characterization via Metric Entropy

Lu, Yiping, Lin, Daozhe, Du, Qiang

arXiv.org Machine Learning

In this paper, we establish a novel connection between the metric entropy growth and the embeddability of function spaces into reproducing kernel Hilbert/Banach spaces. Metric entropy characterizes the information complexity of function spaces and has implications for their approximability and learnability. Classical results show that embedding a function space into a reproducing kernel Hilbert space (RKHS) implies a bound on its metric entropy growth. Surprisingly, we prove a \textbf{converse}: a bound on the metric entropy growth of a function space allows its embedding to a $L_p-$type Reproducing Kernel Banach Space (RKBS). This shows that the ${L}_p-$type RKBS provides a broad modeling framework for learnable function classes with controlled metric entropies. Our results shed new light on the power and limitations of kernel methods for learning complex function spaces.


Decomposition of one-layer neural networks via the infinite sum of reproducing kernel Banach spaces

Shin, Seungcheol, Kang, Myungjoo

arXiv.org Artificial Intelligence

In this paper, we define the sum of RKBSs using the characterization theorem of RKBSs and show that the sum of RKBSs is compatible with the direct sum of feature spaces. Moreover, we decompose the integral RKBS into the sum of $p$-norm RKBSs. Finally, we provide applications for the structural understanding of the integral RKBS class.


Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime

Shilton, Alistair, Gupta, Sunil, Rana, Santu, Venkatesh, Svetha

arXiv.org Machine Learning

This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations; and a novel representor theory for neural networks in terms of a matrix-valued kernel. The first model is exact (un-approximated) and global, casting the neural network as an elements in a reproducing kernel Banach space (RKBS); we use this model to provide tight bounds on Rademacher complexity. The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) in terms of a local-intrinsic neural kernel (LiNK). This local model provides insight into model adaptation through tight bounds on Rademacher complexity of network adaptation. We also prove that the neural tangent kernel (NTK) is a first-order approximation of the LiNK kernel. Finally, and noting that the LiNK does not provide a representor theory for technical reasons, we present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK). This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models. Throughout the paper (a) feedforward ReLU networks and (b) residual networks (ResNet) are used as illustrative examples.


Neural reproducing kernel Banach spaces and representer theorems for deep networks

Bartolucci, Francesca, De Vito, Ernesto, Rosasco, Lorenzo, Vigogna, Stefano

arXiv.org Machine Learning

Studying the function spaces defined by neural networks helps to understand the corresponding learning models and their inductive bias. While in some limits neural networks correspond to function spaces that are reproducing kernel Hilbert spaces, these regimes do not capture the properties of the networks used in practice. In contrast, in this paper we show that deep neural networks define suitable reproducing kernel Banach spaces. These spaces are equipped with norms that enforce a form of sparsity, enabling them to adapt to potential latent structures within the input data and their representations. In particular, leveraging the theory of reproducing kernel Banach spaces, combined with variational results, we derive representer theorems that justify the finite architectures commonly employed in applications. Our study extends analogous results for shallow networks and can be seen as a step towards considering more practically plausible neural architectures.


Hypothesis Spaces for Deep Learning

Wang, Rui, Xu, Yuesheng, Yan, Mingsong

arXiv.org Machine Learning

Deep learning has been a huge success in applications. Mathematically, its success is due to the use of deep neural networks (DNNs), neural networks of multiple layers, to describe decision functions. Various mathematical aspects of DNNs as an approximation tool were investigated recently in a number of studies [9, 11, 13, 16, 20, 27, 28, 31]. As pointed out in [8], learning processes do not take place in a vacuum. Classical learning methods took place in a reproducing kernel Hilbert space (RKHS) [1], which leads to representation of learning solutions in terms of a combination of a finite number of kernel sessions [19] of a universal kernel [17]. Reproducing kernel Hilbert spaces as appropriate hypothesis spaces for classical learning methods provide a foundation for mathematical analysis of the learning methods. A natural and imperative question is what are appropriate hypothesis spaces for deep learning. Although hypothesis spaces for learning with shallow neural networks (networks of one hidden layer) were investigated recently in a number of studies, (e.g.